Parsers as language models for statistical machine translation

نویسندگان

  • Matt Post
  • Daniel Gildea
چکیده

Most work in syntax-based machine translation has been in translation modeling, but there are many reasons why we may instead want to focus on the language model. We experiment with parsers as language models for machine translation in a simple translation model. This approach demands much more of the language models, allowing us to isolate their strengths and weaknesses. We find that unmodified parsers do not improve BLEU scores over ngram language models, and provide an analysis of their strengths and weaknesses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Algorithms for Syntax-Aware Statistical Machine Translation

All of the non-trivial algorithms that are necessary for building and applying a rudimentary syntax-aware statistical machine translation system are generalized parsers. This paper extends the “translation by parsing” architecture by adding two components that are invariably used by state-of-the-art statistical machine translation systems. First, the paper shows how a generic syntax-aware trans...

متن کامل

Machine Translation Using Automatically Inferred Construction-Based Correspondence and Language Models

We discuss the problem of translation in the wider context of the problem of meaning in cognition and describe a structural statistical machine translation (MT) method motivated by philosophical, cognitive, and computational considerations. Our approach relies on a recently published algorithm capable of learning from a raw corpus a limited yet effective grammar that can be used to construct pr...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Automatically Improved Category Labels for Syntax-Based Statistical Machine Translation

A common modeling choice in syntax-based statistical machine translation is the use of synchronous context-free grammars, or SCFGs. When training a translation model in a supervised setting, an SCFG is extracted from parallel text that has been statistically word-aligned and parsed by monolingual statistical parsers. However, the set of syntactic category labels used in a monolingual statistica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008